Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

نویسندگان

چکیده

Most recently, there has been significant interest in learning contextual representations for various NLP tasks, by leveraging large scale text corpora to train powerful language models with self-supervised objectives, such as Masked Language Model (MLM). Based on a pilot study, we observe three issues of existing general-purpose when they are applied the text-to-SQL semantic parsers: fail detect column mentions utterances, infer from cell values, and compose target SQL queries complex. To mitigate these issues, present model pretraining framework, Generation-Augmented Pre-training (GAP), that jointly learns natural utterance table schemas, generation generate high-quality pre-train data. GAP is trained 2 million utterance-schema pairs 30K utterance-schema-SQL triples, whose utterances generated models. experimental results, neural parsers leverage representation encoder obtain new state-of-the-art results both Spider Criteria-to-SQL benchmarks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Structured Natural Language Representations for Semantic Parsing

We introduce a neural semantic parser which converts natural language utterances to intermediate representations in the form of predicate-argument structures, which are induced with a transition system and subsequently mapped to target domains. The semantic parser is trained end-to-end using annotated logical forms or their denotations. We achieve the state of the art on SPADES and GRAPHQUESTIO...

متن کامل

Learning for Semantic Parsing

Semantic parsing is the task of mapping a natural language sentence into a complete, formal meaning representation. Over the past decade, we have developed a number of machine learning methods for inducing semantic parsers by training on a corpus of sentences paired with their meaning representations in a specified formal language. We have demonstrated these methods on the automated constructio...

متن کامل

Multilingual Semantic Parsing : Parsing Multiple Languages into Semantic Representations

We consider multilingual semantic parsing – the task of simultaneously parsing semantically equivalent sentences from multiple different languages into their corresponding formal semantic representations. Our model is built on top of the hybrid tree semantic parsing framework, where natural language sentences and their corresponding semantics are assumed to be generated jointly from an underlyi...

متن کامل

Learning Discrete Representations via Information Maximizing Self-Augmented Training

Learning discrete representations of data is a central machine learning task because of the compactness of the representations and ease of interpretation. The task includes clustering and hash learning as special cases. Deep neural networks are promising to be used because they can model the non-linearity of data and scale to large datasets. However, their model complexity is huge, and therefor...

متن کامل

Learning Discrete Representations via Information Maximizing Self-Augmented Training

Our method is related to denoising auto-encoders (Vincent et al., 2008). Auto-encoders maximize a lower bound of mutual information (Cover & Thomas, 2012) between inputs and their hidden representations (Vincent et al., 2008), while the denoising mechanism regularizes the auto-encoders to be locally invariant. However, such a regularization does not necessarily impose the invariance on the hidd...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i15.17627